Regular expressions

Regular expressions

Plain text specification allows simple matching and replacing of exact text within the editor and should be familiar to most users. Regular expressions can be used to specify text by its characteristics rather than by the exact characters. For example, to find all URLs in a text file, it is known that the identifier can only start with certain characters and thereafter can contain only certain other characters. Regular expressions allow the specifications of such items through the use of a syntax borrowed from tools such as GREP, LEX and YACC.

A regular expression is composed of a sequence of sub-expressions, each of the form in the operators table below. The entire expression may be preceeded by ^ to indicate that the expression is only matched at the start of a line, or ended by $ to indicate that the expression can only exist at the end of a line.

Operators

a+	One or more occurrences of a
a*	Zero or more occurences of a
a?	Zero or one (i.e. optional) occurence of a
a{n}	Exactly n occurences of a
a{n,}	n or more occurences of a
a{,m}	Zero or at most m occurences of a
a{n,m}	At least n but not more than m occurences of a
a|b	Either a or b
a||b	a or b or both a and b in any order
abc	a followed by b followed by c
[abc]	A single character, one of a or b or c

[a-b]	A single character, ranging in value from a to b inclusive
[^abc]	A single character, any except a, b or c
(abc)	a followed by b followed by c
"abc"	The letters a followed by b followed by c with no special significance attached to a, b or c
.	Any character except a newline
\a	The letter a, with no special significance attached to a, special forms:

\t	The tab character
\n	The newline character
\r	The return character
\f	The formfeed character
\b	The backspace character
\xNN	The hex character NN
\0ooo	The octal character ooo
\w	A single character, one of [a-zA-Z0-9_]
\W	Any single character not matching \w
\d	A single character [0-9]
\D	A single character not matching \d

\s	A whitespace character [\t\r\n\f\b\ ]
\S	A single character not matching \s

Simple examples
 
Expression  Matches  Does not match

"this"|"that" this This
  that That
\d{2}\.\d{2} 23.45 2.4
 03.22 0.1
[a-zA-Z_]\w* Identifier 2Identifiers
\(\*[\x01-\x7F]+\*\) (* a comment *) ( No comment *)

Complex example

("http://"|"mailto:"|"ftp://")[^ \n\r\"\<\\]+

Would allow the detection of internet references that start with 'http://', 'mailto:' or 'ftp://'.

In english, the expression reads: "Find all occurrences of text that start with 'http://', 'mailto:' or 'ftp://' and are followed by at least one character that is not one of a space (\s), a newline(\n), a carriagereturn(\r), a quote(\"), a bracket (\<), or a slash (\\)."


Last change: August 07, 2000 - © Softwareentwicklung Marcus Monnig

regular expressions